to go along with
Modern Data Science with R, 3rd edition by Baumer, Kaplan, and Horton
Introduction to Statistical Learning with Applications in R by James, Witten, Hastie, and Tibshirani
geom_point()aes() functionwday.ggplot() function.aes() function.aes() function| year | Algeria | Brazil | Columbia |
|---|---|---|---|
| 2000 | 7 | 12 | 16 |
| 2001 | 9 | 14 | 18 |
| country | Y2000 | Y2001 |
|---|---|---|
| Algeria | 7 | 9 |
| Brazil | 12 | 14 |
| Columbia | 16 | 18 |
| country | year | value |
|---|---|---|
| Algeria | 2000 | 7 |
| Algeria | 2001 | 9 |
| Brazil | 2000 | 12 |
| Brazil | 2001 | 14 |
| Columbia | 2000 | 16 |
| Columbia | 2001 | 18 |
#(a)
babynames |>
group_by(year, sex) |>
summarize(totalBirths = sum(num))
#(b)
group_by(babynames, year, sex) |>
summarize(totalBirths = sum(num))
#(c)
group_by(babynames, year, sex) |>
summarize(totalBirths = mean(num))
#(d)
temp <- group_by(babynames, year, sex)
summarize(temp, totalBirths = sum(num))
#(e)
summarize(group_by(babynames, year, sex),
totalBirths = sum(num))filter()arrange()select()mutate()group_by()(year, sex)(year, name)(year, num)(sex, name)(sex, num)n_distinct(name)n_distinct(n)sum(name)sum(num)mean(num)babynames <- babynames::babynames |>
rename(num = n)
babynames |>
filter(name %in% c("Jane", "Mary")) |>
# just the Janes and Marys
group_by(name, year) |>
# for each year for each name
summarize(total = sum(num))# A tibble: 276 × 3
# Groups: name [2]
name year total
<chr> <dbl> <int>
1 Jane 1880 215
2 Jane 1881 216
3 Jane 1882 254
4 Jane 1883 247
5 Jane 1884 295
6 Jane 1885 330
7 Jane 1886 306
8 Jane 1887 288
9 Jane 1888 446
10 Jane 1889 374
# ℹ 266 more rows
babynames |>
filter(name %in% c("Jane", "Mary")) |>
group_by(name, year) |>
summarize(number = sum(num))# A tibble: 276 × 3
# Groups: name [2]
name year number
<chr> <dbl> <int>
1 Jane 1880 215
2 Jane 1881 216
3 Jane 1882 254
4 Jane 1883 247
5 Jane 1884 295
6 Jane 1885 330
7 Jane 1886 306
8 Jane 1887 288
9 Jane 1888 446
10 Jane 1889 374
# ℹ 266 more rows
babynames |>
filter(name %in% c("Jane", "Mary")) |>
group_by(name, year) |>
summarize(n_distinct(name))# A tibble: 276 × 3
# Groups: name [2]
name year `n_distinct(name)`
<chr> <dbl> <int>
1 Jane 1880 1
2 Jane 1881 1
3 Jane 1882 1
4 Jane 1883 1
5 Jane 1884 1
6 Jane 1885 1
7 Jane 1886 1
8 Jane 1887 1
9 Jane 1888 1
10 Jane 1889 1
# ℹ 266 more rows
babynames |>
filter(name %in% c("Jane", "Mary")) |>
group_by(name, year) |>
summarize(n_distinct(num))# A tibble: 276 × 3
# Groups: name [2]
name year `n_distinct(num)`
<chr> <dbl> <int>
1 Jane 1880 1
2 Jane 1881 1
3 Jane 1882 1
4 Jane 1883 1
5 Jane 1884 1
6 Jane 1885 1
7 Jane 1886 1
8 Jane 1887 1
9 Jane 1888 1
10 Jane 1889 1
# ℹ 266 more rows
Error in `summarize()`:
ℹ In argument: `sum(name)`.
ℹ In group 1: `name = "Jane"` and `year = 1880`.
Caused by error in `base::sum()`:
! invalid 'type' (character) of argument
# A tibble: 276 × 3
# Groups: name [2]
name year `mean(num)`
<chr> <dbl> <dbl>
1 Jane 1880 215
2 Jane 1881 216
3 Jane 1882 254
4 Jane 1883 247
5 Jane 1884 295
6 Jane 1885 330
7 Jane 1886 306
8 Jane 1887 288
9 Jane 1888 446
10 Jane 1889 374
# ℹ 266 more rows
# A tibble: 276 × 3
# Groups: name [2]
name year `median(num)`
<chr> <dbl> <dbl>
1 Jane 1880 215
2 Jane 1881 216
3 Jane 1882 254
4 Jane 1883 247
5 Jane 1884 295
6 Jane 1885 330
7 Jane 1886 306
8 Jane 1887 288
9 Jane 1888 446
10 Jane 1889 374
# ℹ 266 more rows
gdpyeargdpvalcountry–countrygdpyeargdpvalcountry–countrygdpyeargdpvalcountry–countrywherever you are, make sure you are communicating with me when you have questions!
wherever you are, make sure you are communicating with me when you have questions!
no right answer here!
Yes! All the responses are reasons to make a figure.
aes() functionwday.aes() functionanswers may vary. I’d say c. putting the work in context. Others might say b. facilitating comparison or d. simplifying the story. However, I don’t think a correct answer is a. making the data stand out.
mean() (average) instead of the sum(). The other commands compute the total number of births broken down by year and sex.filter()(year, name)sum(num)running the different code chunks with relevant output.
-countryyeargdpval (if possible, good idea to name variables something different from the name of the dataframe)